A concentrated machine learning-based classification system for age-related macular degeneration (AMD) diagnosis using fundus images

The increase in eye disorders among older individuals has raised concerns, necessitating early detection through regular eye examinations. Age-related macular degeneration (AMD), a prevalent condition in individuals over 45, is a leading cause of vision impairment in the elderly. This paper presents a comprehensive computer-aided diagnosis (CAD) framework to categorize fundus images into geographic atrophy (GA), intermediate AMD, normal, and wet AMD categories. This is crucial for early detection and precise diagnosis of age-related macular degeneration (AMD), enabling timely intervention and personalized treatment strategies. We have developed a novel system that extracts both local and global appearance markers from fundus images. These markers are obtained from the entire retina and iso-regions aligned with the optical disc. Applying weighted majority voting on the best classifiers improves performance, resulting in an accuracy of 96.85%, sensitivity of 93.72%, specificity of 97.89%, precision of 93.86%, F1 of 93.72%, ROC of 95.85%, balanced accuracy of 95.81%, and weighted sum of 95.38%. This system not only achieves high accuracy but also provides a detailed assessment of the severity of each retinal region. This approach ensures that the final diagnosis aligns with the physician’s understanding of AMD, aiding them in ongoing treatment and follow-up for AMD patients.

of the disease.Wet AMD, the severe form, involves abnormal blood vessel growth beneath the retina, leading to retinal degeneration and, if left untreated, rapid loss of central vision [6][7][8] .
ML techniques hold significant promise in precisely categorizing fundus images into different AMD stages.These methods efficiently analyze extensive datasets and acquire intricate patterns and relationships from images, allowing for the identification of subtle indicators of various AMD phases.By automating the classification process, ML algorithms offer consistent and objective assessments, reducing discrepancies between observers and facilitating prompt diagnoses.A range of ML algorithms have been explored for the classification of AMD fundus images, as documented in 9 .
Traditional ML methods 10 such as random forest (RF), multilevel perceptron (MLP), decision tree (DT), logistic regression (LR), support vector machines (SVM), and K-nearest neighbors (KNN) have shown promising results in previous studies, showing exceptional performance in image classification tasks 11 .Reliable MLbased settings for classifying fundus images into GA, intermediate AMD, normal, wet AMD categories have great potential for clinical application.Such systems can help eyesight specialists to provide accurate and timely diagnosis, facilitating appropriate treatment options for patients 12 .Furthermore, it can play an important role in large-scale screening programs, enhancing early detection and intervention, and ultimately improving the visual outcome of individuals with AMD 13 .
The aim of the present study is to evaluate and investigate the accuracy of ML methods for classifying AMD stages from fundus images.Aside from revealing the entire selection process, the analysis also considers the metrics employed to examine the developed classification model.The paper further explores the findings, possible obstacles and future approaches to make the ML-based AMD classification systems more accurate and clinically useful.In this regard, this review advances knowledge in this domain and paves the way for better patient care and outcomes associated with computer-aided AMD.The following points provide a summary of the present study's contributions: • Development of a non-invasive CAD system: the study successfully developed a non-invasive CAD system for AMD using ML methods, which provides a valuable tool early diagnosis of his disease.• Improved AMD classification: through extensive research and testing, the study enhanced the accuracy and reliability of AMD classification from fundus images.This improvement ensures a more equitable classification of AMD within different stages.• Enhanced patient care: the CAD system, by automating the AMD diagnostic process and bridging gaps between caregivers, has the potential to significantly improve patient care.This advancement ensures timely and accurate inspections, contributing to enhanced overall healthcare delivery.

Paper organization
The paper is organized as follows: "Related studies" discusses related work for AMD classification."Materials" describes the research materials."Methodology" presents the proposed approach for AMD classification and its phases in details. "Experiments" presents the experimental result and discussion."Overall discussion" discusses the work and experiments."Limitations" highlight the study's limitations.Finally, "Conclusions and future directions" addresses the conclusions and future directions.

Related studies
Recently, several algorithms have been developed to address the challenge of classifying fundus images of agerelated macular degeneration (AMD) by leveraging patterns and features present in the data.These efforts have resulted in a significant body of academic work focused on the classification of AMD fundus images.Furthermore, various classification techniques and methodologies have been explored in these studies.Notable examples include the work of Bhuiyan et al. 14 , who used convolutional neural networks (CNNs) to classify Referable AMD using the AREDS dataset which contains about 116,875 images.The results show that the classification for Disease/no disease provides better results with about 0.992 accuracy and for AMD severity (4 classes) with about 0.961 accuracy.Zapata et al. 15 Proposed a classification approach using CNNs for AMD Disease/no disease using the Optretina dataset which contains about 306,302 images.This research achieved an accuracy of 0.863 and an AUC of 0.936.
Bulut et al. 16 proposed a deep learning approach (i.e., Xception model) for detecting retinal abnormalities based on color fundus images.During the analysis, the Xception model containing 50 different parameter combinations was trained.The highest accuracy achieved was 82.5%.Gayathri et al. 17 proposed an automated binary and multiclass classification of diabetic retinopathy.The proposed work focuses on the extraction of Haralick and Anisotropic Dual-Tree Complex Wavelet Transform (ADTCWT) features that can perform reliable DR classification from retinal fundus images.The evaluation results show that by applying the proposed feature extraction method, Random Forest outperforms all the other classifiers with an average accuracy of 99.7% and 99.82% for binary and multiclass classification, respectively.Furthermore, Rajagopalan et al. 18 proposed a deep convolution neural network (DCNN) architecture for the classification and diagnosis of average diabetic macular edema (DME) and drusen macular degeneration (DMD) efficiently.Firstly, the despeckling of the input OCT image is executed by the Kuan filters to remove inherent speckle noise.Furthermore, the CNN networks are tuned with hyper-parameter optimization methods.Moreover, K-fold validations are performed to guarantee full use of the datasets.Chakravorti et al. 19 proposed an efficient CNN for AMD classification.The network was trained on fundus images to classify them into the four AMD categories, achieving high accuracy with reduced computational complexity.Thomas et al. 20 developed an algorithm for the diagnosis of AMD in retinal OCT images based on the detection of RPE layers and the baseline estimate of statistical approaches and randomization.
Additionally, Zheng et al. 21designed a five-category intelligent auxiliary diagnosis model for common fundus diseases.The accuracy rates of the 3 intelligent auxiliary diagnosis models were all above 90%, and the kappa values were all above 88%.For the 4 common fundus diseases, the best results of sensitivity, specificity, and F1-scores were 97.12%, 99.52%, 96.43%, and 98.21%, respectively.Vaiyapuri et al. 22 presented a new multi-retinal disease diagnosis model using the IDL-MRDD technique to determine different types of retinal diseases.The experimental values pointed out the superior outcome over the existing techniques with a maximum accuracy of 0.963.Lee et al. 23 proposed two deep learning models, CNN-LSTM and CNN-Transformer, which use a Long-Short Term Memory (LSTM) and a Transformer, respectively with CNN, to capture the sequential information in longitudinal CFPs.The proposed models outperformed the baseline models that utilized only single-visit CFPs to predict the risk of late AMD (0.879 vs. 0.868 in AUC for 2-year prediction, and 0.879 vs. 0.862 for 5-year prediction).
Moreover, Kar et al. 24 introduced an innovative method for precise retinal blood vessel detection in fundus images.Their approach features a generative adversarial network (GAN) 25 with a unique architecture, combining a multi-scale residual convolutional neural network as the generator and a vision transformer as the discriminator.The GAN model, employing adversarial learning, achieves state-of-the-art results.Preprocessing involves contrast enhancement using a contrast-limited adaptive histogram equalization algorithm.Rigorous evaluations on multiple databases confirm the method's robustness and efficacy, outperforming existing approaches with notable accuracy scores on CHASE_DB1, DRIVE, HRF, and ARIA databases.
Similarly, Singh et al. 29 proposed an efficient glaucoma detection system using customized particle swarm optimization (CPSO) and four state-of-the-art machine-learning classifiers.The interconnected architecture involves pre-processing, segmentation, feature extraction, selection of critical features, and classification using CPSO-machine learning.The study focuses on a public dataset, Digital Retinal Images for Optic Nerve Segmentation.Unlike using all 20 extracted features, the system selects critical features based on univariate and feature importance methods.The best performance is achieved with a CPSO-K-nearest neighbor hybrid method, recording a maximum accuracy of 0.99, specificity of 0.96, sensitivity of 0.97, precision of 0.97, F1-score of 0.97, and Kappa of 0.94.Singh et al. 30,31 also addressed feature selection challenges in machine learning, focusing on glaucoma detection using benchmark datasets.The study introduces a metaheuristics-based feature selection technique employing emperor penguin optimization and bacterial foraging optimization, proposing a hybrid algorithm.From 36 features extracted from retinal fundus images, the technique minimizes the feature set while enhancing classification accuracy.Six machine learning classifiers evaluate smaller subsets provided by the optimization techniques.The hybrid optimization technique, paired with random forest, achieves the highest accuracy at 0.95410.
In summary, these studies collectively provide valuable insights into the performance of diverse classification techniques for AMD fundus image classification.The results highlight the effectiveness of deep learning methods and the importance of feature extraction techniques in achieving accurate and reliable classifications.While Random Forest and SVM often excel in terms of classification accuracy, it is crucial to consider the dataset, feature extraction methods, and evaluation metrics when interpreting specific results from these studies.
While previous investigations have provided valuable insights into the performance of diverse classification techniques for AMD fundus image classification, there remains a notable gap in the interpretation of the data by extracting both local and global features from it.Many of the mentioned studies have focused on the application of deep learning and convolutional neural networks (CNNs) for classification, achieving impressive accuracy rates.However, these studies have predominantly emphasized the utilization of deep learning methods without comprehensive exploration of feature extraction techniques that capture both local and global characteristics of fundus images.
The extraction of local features, which pertain to specific regions or structures within the image, can provide valuable information about subtle abnormalities in the retina.Similarly, global features, which encompass broader characteristics of the entire image, can offer insights into overall patterns and structures.Combining both types of features can enhance the interpretability of the classification process and potentially lead to more robust and explainable results.

Materials Patient selection and characteristics
Patient selection required the collection of retinal fundus images from a diverse group of real patients who showed symptoms associated with AMD, including different stages and types of disease.The database used in this study consists of more than 864 retinal images, including AMD.Each patient's demographic and clinical characteristics, including age, sex, and their specific AMD category, were recorded.The experimental protocols were approved by the authors' and patients' institutions: University of Louisville and Mansoura University.

Imaging techniques
The retinal imaging techniques used in this study primarily used fundus color imaging.These two-dimensional images were obtained by light-reflecting retina 32 .Complete data were obtained by performing the left and right eyes of each patient.Using state-of-the-art imaging equipment, high-quality, and high-resolution images were obtained.
• Fundus color images were taken with standard retinal imaging techniques.
• High quality/resolution imaging equipment was used to ensure image quality and detail.
• Images of the left and right eyes were obtained for each patient for all analyses.

Data collection and analysis
Data collection involved the systematic acquisition of retinal images from the fundus images maintained for this study.The collected data were intensively analyzed and subsequently extracted relevant features and attributes necessary for an ML-based diagnostic model.

Data categorization
The data used in this study included four distinct types of AMD, namely geographic atrophy (GA), central, normal AMD, and wet AMD.Each category represents a different stage or form of AMD.Classification of cases was based on clinical evaluation and expert review, which ensured classification accuracy.

Study design and ethical considerations
In the context of this research, a study design was established to investigate the use of ML-based medical diagnostic techniques for the classification of AMD using retinal fundus images.Ethical considerations were taken into account, and all procedures adhered to the relevant ethical guidelines and regulations.Informed consent was obtained from all patients involved in the study.The current study does not contain any studies with human participants and/or animals performed by any of the authors.

Consent to participate
All patients have provided informed consent for the present study.

Methodology
A detailed CAD framework for AMD analysis is introduced and shown graphically in

Data pre-processing phase
Data preprocessing is the important stage in image analysis consisting of some steps which aim at improving data quality and its preparation for further analysis 33,34 .The data preprocessing pipeline (Fig. 2) denotes a systematic way of preparing a dataset for analysis with an appropriate basis as outlined above.The suggested pipeline is split into a set of step where each is designed to enhance the upcoming classification quality and consistency: • Average CDF calculation: calculate the average CDF for the different classes to understand the pixel intensity distribution.• CLAHE enhancements: apply the CLAHE algorithm to improve image contrast and reducing noise.
• Interpolation: interpolate the individual image CDFs with the mean CDF to standardize the intensity distri- bution.• ROI extraction: use masks to extract regions of interest from images, focusing on relevant regions.
• Contour analysis: use contour analysis to adjust ROIs, define object boundaries, and calculate object proper- ties.

Average Cumulative Distribution Function for Class-Specific Pixel Intensities
An important concept in data preprocessing is the cumulative distribution function (CDF).An CDF represents the cumulative probability distribution of pixel intensities in an image, and provides valuable insight into its properties.To prepare our data, we calculate the average CDF (ACDF) for each class within our dataset.This average CDF, denoted as ACDF C (x) for class C, is computed by aggregating the individual CDFs of all images in that class.The mathematical representation is as follows: where, ACDF C (x) represents the average CDF for class C at pixel intensity x, N C is the total number of images in class C, and

Contrast limited adaptive histogram equalization (CLAHE)
Histogram equalization is a technique used to enhance image contrast by redistributing pixel intensities.However, it can inadvertently amplify noise 35   • Contrast limiting: to prevent excessive amplification of pixel values, CLAHE limits the slope of the CDF within each region.This limitation balances contrast enhancement with noise control.

Interpolate the average CDF with images
Interpolation is a vital step in making the individual images to conform to the average CDF of their classes.This guarantees that images in a given class are uniform in terms of intensity distribution and this, therefore, ensures ease of comparisons and later analysis.The interpolation operation is defined as follows: I eq (x, y) = InterpolateCDF(I(x, y), targetFreq, targetBins) where I eq (x, y) represents the equalized pixel intensity at coordinates (x, y), and InterpolateCDF(I(x, y), targetFreq, targetBins) is the interpolation function.The goal is to adjust pixel values of each image so that they align with the target CDF, effectively normalizing the data.

Extract the ROIs using masks
The ROIs are important to the analysis, since they represent parts of images that contain important information 36,37 .This is achieved by means of binary masks, where values of 1 represent the region of interest and 0 indicates background.The image is multiplied with a binary mask, that is, the pixel-wise product, where the areas of interest are selected while the background is set to zero.

Contours handling
Contour analysis is a pivotal step in refining ROIs and identifying object boundaries within images 38,39 .Contours represent the boundaries of objects and offer valuable properties like area, perimeter, and centroid 40 .The centroid (C x , C y ) of an object within a contour is calculated using moments, which are mathematical descriptors of the shape and spatial distribution of an object.Moments are defined as: C x = M 10 M00 and C y = M 01 M 00 where C x and C y are the x and y coordinates of the centroid, respectively, and M 00 is the zeroth moment (i.e., total area of the contour).Contour analysis allows us to precisely locate object boundaries, measure object properties, and define buffer regions, essential for various image analysis tasks.The contours are take at different distances from 0 (representing the original mask) up to 1500. Figure 4 represents a visualization of the extracted ROIs on a sample.

Features extraction phase
In the field of image processing and texture analysis, feature extraction plays a crucial role in quantifying the characteristics of an image.The study worked on extracting both first-order and second-order features using GLCM (Gray-Level Co-occurrence Matrix) and GLRLM (Gray-Level Run-Length Matrix) methods for each extracted contour resulted from the pre-processing step 41 .
GLCM is a statistical method used to capture the spatial relationships between pixel values in an image.It is defined based on the co-occurrence of pairs of pixel values at various distances and angles in the image 42 .GLCM is typically calculated for a given gray-level image I, with discrete gray levels {0, 1, . . ., L − 1} .Second- order texture features consider the spatial relationships between pixel pairs in an image.These features provide information about how pixel intensities are distributed relative to each other.Equations of the first-and secondorder features were presented in the appendix.

Scaling phase
The current study utilized several data scaling techniques to preprocess the dataset effectively, enhancing the performance of ML models.These scaling methods are crucial for ensuring that features are on compatible scales The x-axis is the intensity while the y-axis is the probability.and optimizing the behavior of various algorithms during the analysis 43,44 .The study employed four main scaling techniques: Standardization (Z-score Scaling), Min-Max Scaling (Normalization), and Max Absolute Scaling.
Standardization (Eq. 1) transforms data to have a mean of 0 and a standard deviation of 1.It is beneficial when dealing with features of different units, making them comparable by subtracting the mean and dividing by the standard deviation 45 .Min-Max scaling (Eq.2) scales data to a specified range, often between 0 and 1.It maintains relative relationships between data points by subtracting the minimum value and dividing by the range 46 .Max Absolute (Eq. 3) scaling scales data based on the maximum absolute value within each feature, maintaining the sign of the data while restricting it within a consistent range 47 .

Classification and optimization phase
The present study used advanced classification schemes to locate and analyze the data.These algorithms include various methods, such as LightGBM (LGBM), Histogram-based Gradient Boosting (HGB), XGBoost (XGB), AdaBoost, Random Forest (RF), Multi-Layer Perceptron (MLP), Decision Tree (DT), logistic regression (LR), support vector machine (SVM), and K-nearest neighbor (KNN) 48 .This wide selection of algorithms was selected to evaluate their performance and suitability for the particular classification task at hand [49][50][51] . (1) (2) (3) . LGBM, known for its speed and accuracy, was used to efficiently handle large data sets by creating unique vertical decision trees.Besides, calculating HGB uses histogram-based methods to optimize computing and memory usage, particularly useful for data structure 52,53 .
XGB, the versatile gradient worsting algorithm was also included because of its ability to handle complex data relationships 54 .AdaBoost is an ensemble method focusing on combining weak learners and it utilizes an iterative approach to improve classification accuracy 55 .RF is used for its robustness and flexibility for data types for classification and regression tasks 56 .
Furthermore, MLP, a neural network with multiple interconnected layers, was used to tackle complex, nonlinear data 57 .DT provided a straightforward yet powerful method for data partitioning based on key features, offering interpretability 58 .LR served as a simple yet effective baseline model, particularly suited for binary and multiclass classification tasks 59 .
SVM was leveraged for its versatility in handling high-dimensional data and linear or nonlinear classification tasks 60 .Finally, KNN offered an intuitive approach to classification, considering the majority class among the nearest neighbors of data points 61 .
The current study utilized Bayesian optimization with the Tree of Parzen Estimators (TPE) to optimize ML models, a powerful and efficient approach for hyperparameter tuning.Tree of Parzen Estimators is a probabilistic model-based optimization algorithm that effectively navigates the hyperparameter search space to find optimal configurations for ML models 62 .
TPE is particularly well-suited for hyperparameter optimization because it leverages a probabilistic model to make informed decisions about where to explore the hyperparameter space.Its working flow can be summaized as follows 62,63 : • Modeling probability distributions: TPE begins by modeling the probability distributions of the hyperpa- rameters.It maintains two distributions, one for promising configurations (i.e., exploitation) and another for less promising ones (i.e., exploration).• Sampling: the algorithm then samples hyperparameters from these distributions.It does so in a way that favors promising regions based on the exploitation distribution but also explores other areas based on the exploration distribution.• Evaluating the objective function: the sampled hyperparameters are used to train and evaluate the ML model using a chosen objective function, such as accuracy or loss.The performance of the model is recorded.• Updating distributions: based on the performance of the sampled configuration, TPE updates the probability distributions for both exploitation and exploration.It allocates more samples to regions of the hyperparameter space that have shown promise.• Iterative process: TPE iteratively repeats the process of sampling, evaluating, and updating distributions over a predefined number of iterations.This allows it to gradually refine its search and converge towards the optimal hyperparameters.• Final configuration: at the end of the optimization process, TPE provides the best-found hyperparameters, which can then be used to train the final ML model.
TPE is known for its efficiency in finding near-optimal hyperparameter configurations with a relatively small number of model evaluations.It is particularly valuable when the hyperparameter space is high-dimensional or when manual tuning becomes impractical 63 .Table 1 presents the different hyperparameters for each ML classifier in the current study.

Performance evaluation phase
In the present study, different performance metrics were used to evaluate the effectiveness of ML models in the AMD classification task.They play an important role in evaluating model quality and in determining decisions in model selection and optimization 50,64 .Confusion matrix is a tabular representation of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).It is important for evaluating model performance and deriving other metrics such as accuracy and F1 [65][66][67] .Accuracy is a widely used metric that measures the average of correctly classified observations across all observations 68 .This provides a general understanding of the efficiency of the classification model, but can be misleading in cases of imbalanced datasets.
Sensitivity indicates how well the model is able to identify good patterns 69 .It calculates the proportion of true positives out of all actual positives.It is important when reducing false negatives is paramount, such as in clinical research 70 .Specificity examines how well the model is able to detect negative cases 71 .It calculates the proportion of true negatives out of all actual negatives.It is important when avoiding false positives is important, as seen in applications such as fraud detection 72 .
The receiver operating characteristic (ROC) is a graphical representation that illustrates a model's performance across different thresholds 73 .It plots the TP rate against the FP rate at various threshold settings, with the area under the ROC curve (AUC-ROC) quantifying the model's ability to distinguish between +ve and −ve instances 74,75 .BAC is an adjusted accuracy measure that is used for imbalanced datasets.It provides a more reliable performance assessment in skewed class distributions.Equations ( 4) to (10) presents the utilized metrics with their equations.

Experiments
For all experiments, the reported performance metrics for the various ML models are: accuracy, sensitivity, specificity, precision, F1, ROC, BAC, and WSM.Also, as mentioned, The experimental protocols were approved by the authors' and patients' institutions: University of Louisville and Mansoura University.Table 2 presents the performance results of the implemented framework across different phases, each corresponding to a mask positioned at varying distances from 0 to 1500.The distances are measured in units that align with the dimensions of the mask.The table provides a detailed overview of various evaluation metrics for each configuration, shedding light on the framework's efficacy under different spatial settings.The "Distance" column specifies the distance of the mask from its original position, and the "Combinations" column denotes the specific combinations of parameters used in each experiment.
The subsequent columns contain performance metrics, including Accuracy (ACC), Sensitivity (SNS), Specificity (SPC), Precision (PRC), F1 score (F1), Receiver Operating Characteristic (ROC) score, Balanced Accuracy (BAC), and Weighted Similarity Metric (WSM).Examining the results, notable trends emerge.For instance, as the distance increases, there is a discernible impact on metrics such as accuracy, sensitivity, and specificity.
The table indicates that the performance varies across different combinations of parameters and distances.Notably, at 1500 units, the framework achieves impressive results across all metrics, indicating its robust performance when the mask is positioned farther from its original location.Tables 5, 6, 7, 8, 9, 10 and 11 in the Appendices present the inner details of each row/distance.
The experiment conducted at a distance of 1500 units stands out as the most noteworthy configuration in Table 2.In this setting, the mask is positioned at a considerable distance from its original location, and the framework achieves outstanding performance across all evaluated metrics.With an impressive accuracy (ACC) of 96.31%, the model demonstrates a high degree of correctness in its predictions.Moreover, the sensitivity (SNS) and specificity (SPC) scores, measuring the model's ability to identify positive and negative instances, are notably high at 92.65% and 97.54%, respectively.The Precision (PRC) and F1 score (F1) further highlight the framework's precision and balance between precision and recall.The receiver operating characteristic (ROC) score, balanced accuracy (BAC), and weighted similarity Metric (WSM) collectively reinforce the exceptional performance of the model in accurately classifying instances when the mask is positioned at this substantial distance.This result underscores the model's resilience and effectiveness, particularly when faced with spatial variations in the positioning of the mask.
Majority voting: by applying the weighted majority voting on the best classifiers regarding each polygon distanced from 0 to 1500, the performance is enhanced with an overall accuracy of 96.85%, sensitivity of 93.72%, specificity of 97.89%, precision of 93.86%, F1 score of 93.72%, ROC of 95.85%, BAC of 95.81%, and WSM of 95.38%.The fact that the performance is enhanced by applying majority voting on the best classifiers regarding each polygon distanced from 0 to 1500 suggests that the model is able to learn different patterns for different polygon distances.This is important as the model is more likely to be able to generalize to new instances, even if the polygon distances in the new instances are different from the polygon distances in the training instances.
Enhancing interpretation through contour overlay: in image analysis and classification, the overlay of contours on an image plays an important role in improving the interpretation of the results.This method is particularly valuable when dealing with multiple classes, each of which is assigned a specific label.( 11) The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced from 0 to 1500, which corresponds to the mask itself, are presented.www.nature.com/scientificreports/ Figure 5 displays the overlaid contours, where most of them are correctly identified, except for two.The Wet class is depicted in red, the GA class in green, and the Normal class in yellow, all with an opacity set to 0.1.This visual representation allows for a quick and intuitive assessment of which contours have been inaccurately diagnosed.By assigning distinct colors to each class, it becomes evident which specific classes are affected by misclassifications.
While the visual identification of incorrectly diagnosed contours is vital, it is also essential to consider the broader context.Figure 6 demonstrates an overlaid image with different contours, all correctly diagnosed.This comprehensive visualization not only assures the accuracy of individual contours but also provides confidence in the overall diagnosis of the entire image.

Overall discussion
In our study, we have presented a detailed Computer-Aided Diagnosis (CAD) framework for Age-related Macular Degeneration (AMD) analysis.The CAD system is designed to assist in the diagnosis of AMD through a multi-stage process involving data acquisition, pre-processing, feature extraction, scaling, classification, and optimization.The entire methodology is encapsulated within a systematic framework, as illustrated in Fig. 1.
The framework begins with the data acquisition phase, where necessary information is obtained.This is followed by the pre-processing stage, where various techniques such as Average CDF calculation, CLAHE enhancements, interpolation, ROI extraction, and contour analysis are applied to improve data quality and prepare it for analysis.The feature extraction phase involves extracting meaningful patterns from the pre-processed data using Gray-Level Co-occurrence Matrix (GLCM) and Gray-Level Run-Length Matrix (GLRLM) methods.This step is crucial in quantifying the characteristics of the images.
In our research, the choice of feature extraction methods was driven by a consideration of the unique characteristics of the data and the specific requirements of the AMD diagnosis task.For capturing global appearance markers from the entire fundus image, we opted for techniques such as average cumulative distribution function (ACDF) calculation and contrast limited adaptive histogram equalization (CLAHE).These methods are adept at providing an understanding of pixel intensity distributions and enhancing contrast throughout the entire fundus image.This global perspective is essential for ensuring that the diagnostic model considers the overall structure and characteristics of the retina.
For local appearance markers, particularly in the optical disc section, we employed techniques such as region of interest (ROI) extraction and contour analysis.By focusing on specific regions of interest within the image and using binary masks and contour analysis, we aimed to capture detailed local information.This emphasis on local features is critical for identifying subtle patterns or abnormalities around the optical disc, contributing to a nuanced and accurate AMD diagnosis.
After feature extraction, the data goes through a scaling phase, where different scaling techniques such as standardization, min-max scaling, and max absolute scaling are applied to ensure that features are on compatible scales, optimizing the behavior of various machine learning (ML) algorithms.
The heart of the CAD system lies in the classification and optimization phase, where a variety of ML algorithms, including LightGBM, Histogram-based Gradient Boosting, XGBoost, AdaBoost, Random Forest, Multi-Layer Perceptron, Decision Tree, Logistic Regression, Support Vector Machine, and K-nearest neighbor, are employed.The Tree of Parzen Estimators (TPE) is used for hyperparameter optimization, ensuring the ML models are finely tuned for the specific task.
Finally, the performance evaluation phase employs various metrics such as confusion matrix, accuracy, sensitivity, specificity, ROC, F1-score, and balanced accuracy (BAC) to comprehensively evaluate the effectiveness of the ML models in AMD classification.These metrics are not only presented individually but are also combined into a Weighted Sum Metric (WSM) to provide an insightful measure of model performance.
Concerning the consideration of using transformers and convolutions for feature extraction, it is important to note that our decision was influenced by several factors.In the context of medical imaging, obtaining large labeled datasets for training deep learning models can be challenging.Conventional methods remain robust and effective even with smaller datasets.Additionally, the interpretability of results is a crucial consideration in medical contexts.The use of ACDF, CLAHE, and contour analysis provides transparency and interpretability, allowing healthcare professionals to understand the features contributing to a diagnosis.Furthermore, computational efficiency is a key factor, and deep learning models, particularly those involving Transformers and convolutions, may demand substantial computational resources.The simplicity of our chosen methods ensures computational efficiency without compromising the diagnostic accuracy required for AMD diagnosis.While deep learning approaches have shown success in various domains, the specific requirements of our AMD diagnosis task, including interpretability, dataset size considerations, and computational efficiency, led us to choose the outlined feature extraction methods.

Limitations
Several limitations should be noted in this study.The effectiveness of the CAD system relies on data quality and quantity, including variations in image quality and demographic representation, potentially affecting its generalizability.Labeling fundus images with AMD stages can introduce inter-observer variability, impacting ML model training and evaluation.Additionally, the study primarily focuses on broad AMD stage classification, excluding finer subtypes.External validation in diverse clinical settings is necessary to confirm real-world applicability.Regulatory approvals, clinical integration, and addressing ethical concerns are essential for the CAD system's responsible deployment.Deep learning models lack transparency, and the CAD system should always complement clinical expertise.Long-term follow-up and adaptation to geographic variability are areas for future exploration.
Despite these limitations, this research represents a significant step towards enhancing AMD diagnosis through ML.Addressing these challenges and conducting further research can contribute to the continued improvement and responsible implementation of AI-driven diagnostic tools for AMD.

Conclusions and future directions
This study has critically examined the effective application of ML methods to accurately classify the AMD stage from fundus images All aspects of AMD diagnosis are discussed, from data acquisition to preprocessing to feature extraction and selection of ML algorithms therefore.The goal was to develop a non-invasive CAD system that maximizes the accuracy and clinical utility of AMD classification.By applying weighted majority voting on the best classifiers, the performance is enhanced with an overall accuracy of 96.85%, sensitivity of 93.72%,One of the noteworthy outcomes of this study is the improvement in the classification of AMD stages.Through rigorous experimentation and analysis, an advance in the accuracy and reliability of categorizing AMD into geographic atrophy (GA), intermediate AMD, normal, and wet AMD categories has been achieved.This enhanced precision is a critical step towards facilitating appropriate treatment strategies for patients.Furthermore, intricate patterns and relationships within fundus images have been illuminated by the study.These patterns enable the identification of subtle indicators of different AMD phases, which in turn can aid in early intervention and treatment.Through the automation of the AMD diagnosis process and the reduction of interobserver discrepancies, enhanced patient care is facilitated by our CAD system.
Looking ahead, several promising directions for future research in this field can be explored.Firstly, further optimization of the ML models can be investigated.Techniques like hyperparameter tuning and the integration of more advanced techniques such as transformers and deep learning can potentially boost the system's accuracy even further.Additionally, the scalability and applicability of the CAD system to larger datasets and diverse populations can be explored.Robustness across different demographic groups and geographic regions can ensure its broad clinical utility.Moreover, the performance of the CAD system can be enhanced by the incorporation of more extensive image datasets and advanced imaging technologies.This could involve the utilization of high-resolution images and multimodal data fusion for a more comprehensive assessment.In terms of clinical application, validation studies involving real-world patient data and collaboration with healthcare institutions can validate the effectiveness of the CAD system in a clinical setting.Regulatory approval and integration into routine clinical practice would be significant milestones.Lastly, ongoing research into the underlying mechanisms of AMD, including genetics, biomarkers, and treatment options, can complement the diagnostic capabilities of the CAD system.Combining ML-based diagnosis with cutting-edge treatment strategies can usher in a new era of precision medicine for AMD. ).The difference in accuracy between Top-8 and HGB is only 0.77%.This is a relatively small difference.The difference in ROC between Top-8 and HGB is also relatively small (1.04%).
The difference in BAC between Top-8 and HGB is slightly larger (1.03%).The difference in WSM between Top-8 and HGB is also relatively small (1.15%).Table 9 shows the performance results obtained using the proposed framework with with a mask at polygon distance 1000.Both LGBM and Top-9 perform very well, with accuracies above 90%.However, LGBM slightly outperforms Top-9 on all metrics except for precision, ROC, and WSM, where Top-9 is slightly better.LGBM has a higher accuracy (90.25% vs 90.20%), sensitivity (80.57% vs 80.45%), specificity (93.50% vs 93.46%), F1-score Table 5.The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 0, which corresponds to the mask itself, are presented.LGBM and Top-9 is very small.If you need the highest possible accuracy and robustness, then LGBM is the better choice.However, if WSM is more important to you, then Top-9 may be the better choice.The difference in accuracy between LGBM and Top-9 is only 0.05%.This is a very small difference.The difference in ROC between LGBM and Top-9 is also very small (0.07%).The difference in BAC between LGBM and Top-9 is slightly larger (0.07%).The difference in WSM between LGBM and Top-9 is also very small (0.03%).www.nature.com/scientificreports/Table 10 shows the performance results obtained using the proposed framework with with a mask at polygon distance 1250.Both LGBM and Top-2 perform extremely well, with accuracies above 93%.They achieve identical results on all metrics, including sensitivity, specificity, precision, F1-score, ROC, BAC, and WSM.

Figure 1 .
Figure 1.The proposed framework for AMD diagnosis in the current study comprises distinct stages, including data acquisition, pre-processing, feature extraction, classification, and the utilization of Tree of Parzen Estimators (TPE) for optimization.
Figure3shows the normalized average CDFS for each class.
. Contrast Limited Adaptive Histogram Equalization (CLAHE) builds on this concept by applying histogram equalization locally, in small regions of an image.The theoretical foundation of CLAHE involves several key aspects: • Histogram equalization: traditional histogram equalization stretches the intensity values across the entire image, aiming for a uniform distribution.Mathematically, it can be represented as: CLAHE(x, y) = CDF clip (I(x, y)) where CLAHE(x, y) denotes the CLAHE-enhanced pixel intensity at coor- dinates (x, y), and CDF clip (I(x, y)) is the clipped CDF of the pixel intensity I(x, y).• Adaptive approach: CLAHE adapts the histogram equalization process by dividing the image into smaller tiles or regions.Each region is equalized independently, allowing for localized contrast enhancement.

Figure 2 .
Figure 2. Image pre-processing steps: illustration of the various stages of image preparation for dataset samples.The top row displays the original data.The middle row shows the data after applying CLAHE.The bottom row exhibits the resized data following CDF interpolation.On the right side, you can see contours at different distances from 0 (representing the original mask) up to 1500.

Figure 3 .
Figure 3. Visualization of the normalized average CDFs for each class: GA, Intermediate, Normal, and Wet.The x-axis is the intensity while the y-axis is the probability.

Figure 4 .
Figure 4. Contours handling: visualization of the extracted ROIs on a sample.It shows the the ROIS at distances from 0 (representing the original mask) up to 1500.

Figure 5 .
Figure 5. Visualization of an image with overlaid contours, where most of them are correctly identified, except for two.The Wet class is depicted in red, the GA class in green, and the Normal class in yellow, all with an opacity set to 0.1.White contours have been incorporated to enhance the visualization.

Figure 6 .
Figure 6.Visualization of an overlaid image featuring the Wet class with accurately diagnosed contours.The overlay is in red with an opacity of 0.1.White contours have been incorporated to enhance the visualization.

Table 1 .
The different hyperparameters for each utilized ML classifier in the current study.

Table 6 .
The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 250.

Table 7 .
The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 500.

Table 8 .
The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 750.

Table 9 .
The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 1000.

Table 10 .
The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 1250.

Table 11 .
The performance results obtained by implementing the framework phases with the mask positioned at a polygon distanced of 1500.